22 May 2018

DATA DESCRIPTION

URL- https://www.kaggle.com/zusmani/us-mass-shootings-last-50-years/data

The dataset I chose to use for this project was found on Kaggle, a website in which datasets based on various topics from around the world are shared to the public. The dataset gives information on mass shootings around the United States of America specifically from the time period of 1966-2017.

QUESTIONS

  1. Have mass shootings throughout the US increased or decreased over time?
  2. Have there been particular time periods where there are more, does there seem to be a trend? 3.Which cities and areas are more prone to gun violence according to the dataset?
  3. Is there any correlation between the shooter and age/gender/race?
  4. What is the most common motivation for gun violence?
  5. How are mental health issues related to the number of shootings?

Allocating data to a Data Frame

                                 V1
1: Mass_Shootings_Dataset_Ver_4.csv
[1] 1 1

Variables that are not relevant within the scope of this analysis.

-Employed at is a variable that is not relevant to the questions that I have posed, and is therefore omitted, as it cannot contribute to the accuracy of my analysis.

-Although the question of whether or not the shooter was employed is relevant, the specifics of employment are not necessary to include.

-Policeman Killed, and Open Close Location were also removed as these were not relevant to my analysis.

Text based variables

-Present useful detail in analysis, but it is only useful after the data has been processed.

-As Latitude and Longitude already provides a specific location, Incident area is uneccessary.

-Summary variable proves to be overly specific and does not help with analysis

Missing variables map

[1] 0.04375
[1] 0.1904762
[[1]]
# A tibble: 5 x 3
  n_miss_in_case n_cases pct_miss
           <int>   <int>    <dbl>
1              0      60   18.8  
2              1     241   75.3  
3              2       7    2.19 
4              3       9    2.81 
5              4       3    0.938

Missing variables map

-Substantial amount of values in the column ("Employed y/n)"), are missing.

-Including this column in the dataset would make for inaccurate results in my plots, therefore I removed this column.

Race

-In the original dataset, the race column has 17 different races, some that are overlapped. I combined these 4 variations into one; "White" to make analysis easier, and did the same for other race categories.

-I ended up with just 6 race categories: White, Black, Asian, Latino, Native American or Alaska Native, and Unclear/Unknown.

-The same process was used for gender, to end up with 3 variations; Male, Female, and Unknown/Unclear

-The same process is repeated for Mental Illness, to end up with three variables: Yes, No, and Unknown/Unclear

Main Causes of Shooting

What is the most common motivation for gun violence?

-I found that the main causes of shootings in the US, were due to psychosis, terrorism motives, and anger and frustration.

-Other problems such as unemployment, revenge and racism, are also quite common.

-Although robberies would seem like it should be a higher number due to the number of robberies we hear about, the number of shootings that occur at robberies are quite low.

Distribution of Shootings by year

Have mass shootings throughout the US increased or decreased over time?

-Have there been particular time periods where there are more, does there seem to be a trend?

-Interestingly, we see that 2016 is the year with the most shootings, closely followed by 2015, which are both significantly far ahead, than all other years, from 1998.

-According to this plot, shootings have significantly increased in the US, evident in the fact that recent years such as 2013,2014,2015, and 2016 show higher number of shootings than previous years.

-Over time, there doesnt seem to be a specific trend of whether or not shootings have increased or decreased over time.

Distribution of Shootings throughout US

US

##Which cities and areas are more prone to gun violence according to the dataset? -As shown by this map, it seems that there does not seem to be any specific trend in terms of the distribution of victims of shootings in the US.

-Most shootings are concentrated in coastal area

-Less shootings in the mid areas.

-Shootings that do occur in the mid areas, seem to claim more victims than shootings in coastal areas.

Distribution of shooters with mental issues

How are mental health issues related to the number of shootings?

Distribution of shooters with gender

Gender of Shooter

-An overwhelmingly large number of male shooters

-Very few female shooters, this is reflected in the media

CONCLUSION

-This dataset taught me many things about the distribution of shootings, and the relevance age,gender,race,mental health issues and location can have to the amount of shootings that occur.

-This dataset brought to light the sheer volume of shootings that unfortunately occur in the US. Although this dataset was very useful in analyzing shootings, a dataset with less missing values, and more useful variables such as religion, abuse history, criminal record (Y/N), would have made for a deeper analysis, as to the cause and motive of these shootings ##References https://www.kaggle.com/zusmani/us-mass-shootings-last-50-years/data

R. STUDIO version 3.4.2

R. PACKAGES: (tidyverse) (ggplot2) (ggmap) (ggthemes) (plotly) (visdat) (naniar) (maps) (lubridate) (stringr) (tidyr) (dbplyr) (readr)